decision threshold
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Oceania > Australia (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.93)
meval: A Statistical Toolbox for Fine-Grained Model Performance Analysis
Sutariya, Dishantkumar, Petersen, Eike
Analyzing machine learning model performance stratified by patient and recording properties is becoming the accepted norm and often yields crucial insights about important model failure modes. Performing such analyses in a statistically rigorous manner is non-trivial, however. Appropriate performance metrics must be selected that allow for valid comparisons between groups of different sample sizes and base rates; metric uncertainty must be determined and multiple comparisons be corrected for, in order to assess whether any observed differences may be purely due to chance; and in the case of intersectional analyses, mechanisms must be implemented to find the most `interesting' subgroups within combinatorially many subgroup combinations. We here present a statistical toolbox that addresses these challenges and enables practitioners to easily yet rigorously assess their models for potential subgroup performance disparities. While broadly applicable, the toolbox is specifically designed for medical imaging applications. The analyses provided by the toolbox are illustrated in two case studies, one in skin lesion malignancy classification on the ISIC2020 dataset and one in chest X-ray-based disease classification on the MIMIC-CXR dataset.
- Health & Medicine > Diagnostic Medicine > Imaging (0.90)
- Health & Medicine > Therapeutic Area (0.66)
- Europe > United Kingdom > England > Bristol (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Education (0.65)
- Health & Medicine (0.47)
Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction
Yu, Yinan, Dippel, Falk, Lundberg, Christina E., Lindgren, Martin, Rosengren, Annika, Adiels, Martin, Sjöland, Helen
Objective: Machine learning (ML) predictive models are often developed without considering downstream value trade-offs and clinical interpretability. This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents to communicate the trade-offs involved in applying ML predictions. Materials and Methods: We developed an ML model predicting 1-year mortality in patients with heart failure (N = 30,021, 22% mortality) to identify those eligible for home care. We then introduced clinical impact projection (CIP) curves to visualize important cost dimensions - quality of life and healthcare provider expenses, further divided into treatment and error costs, to assess the clinical consequences of predictions. Finally, we used four LLM agents to generate patient-specific descriptions. The system was evaluated by clinicians for its decision support value. Results: The eXtreme gradient boosting (XGB) model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95% confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140). Discussion: The CIP cost curves provided a population-level overview of cost composition across decision thresholds, whereas LLM-generated cost-benefit analysis at individual patient-levels. The system was well received according to the evaluation by clinicians. However, feedback emphasizes the need to strengthen the technical accuracy for speculative tasks. Conclusion: CAP utilizes LLM agents to integrate ML classifier outcomes and cost-benefit analysis for more transparent and interpretable decision support.
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Sweden > Västerbotten County > Umeå (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Oceania > Australia (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.93)
Critical appraisal of artificial intelligence for rare-event recognition: principles and pharmacovigilance case studies
Noren, G. Niklas, Meldau, Eva-Lisa, Ellenius, Johan
Many high-stakes AI applications target low-prevalence events, where apparent accuracy can conceal limited real-world value. Relevant AI models range from expert-defined rules and traditional machine learning to generative LLMs constrained for classification. We outline key considerations for critical appraisal of AI in rare-event recognition, including problem framing and test set design, prevalence-aware statistical evaluation, robustness assessment, and integration into human workflows. In addition, we propose an approach to structured case-level examination (SCLE), to complement statistical performance evaluation, and a comprehensive checklist to guide procurement or development of AI models for rare-event recognition. We instantiate the framework in pharmacovigilance, drawing on three studies: rule-based retrieval of pregnancy-related reports; duplicate detection combining machine learning with probabilistic record linkage; and automated redaction of person names using an LLM. We highlight pitfalls specific to the rare-event setting including optimism from unrealistic class balance and lack of difficult positive controls in test sets - and show how cost-sensitive targets align model performance with operational value. While grounded in pharmacovigilance practice, the principles generalize to domains where positives are scarce and error costs may be asymmetric.
- Information Technology (0.93)
- Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.48)
The Impact of Pseudo-Science in Financial Loans Risk Prediction
Scarone, Bruno, Baeza-Yates, Ricardo
We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Colorado (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- (2 more...)
- Banking & Finance > Loans (1.00)
- Government (0.93)
Towards Improved Research Methodologies for Industrial AI: A case study of false call reduction
Pfab, Korbinian, Rothering, Marcel
--Are current artificial intelligence (AI) research methodologies ready to create successful, productive, and profitable AI applications? This work presents a case study on an industrial AI use case called false call reduction for automated optical inspection to demonstrate the shortcomings of current best practices. We identify seven weaknesses prevalent in related peer-reviewed work and experimentally show their consequences. We show that the best-practice methodology would fail for this use case. We argue amongst others for the necessity of requirement-aware metrics to ensure achieving business objectives, clear definitions of success criteria, and a thorough analysis of temporal dynamics in experimental datasets. Our work encourages researchers to critically assess their methodologies for more successful applied AI research. The rise of automation in manufacturing has brought significant advancements to production processes. However, are current artificial intelligence (AI) research methodologies ready to create successful, productive, and profitable AI applications? Despite extensive research, the success of industrial AI applications has not kept pace with other industrial automation technologies due to methodological weaknesses. In this work, we address these methodological flaws using a case study on false call reduction in automated optical inspection (AOI) of printed circuit boards (PCBs). AOI systems, which use computer vision to inspect soldering quality, often produce a high number of false calls--incorrect classifications of non-defective PCBs as defective. These false calls consume valuable human resources in manual inspection stages. Our study identifies seven prevalent weaknesses in related research on this topic and demonstrates their negative impacts experimentally.
- Europe > Germany (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (3 more...)
Anticipating Gaming to Incentivize Improvement: Guiding Agents in (Fair) Strategic Classification
Alhanouti, Sura, Naghizadeh, Parinaz
While the use of MLdriven systems can enhance efficiency, it can also drive the humans who are subject to algorithmic decisions to adjust their behavior accordingly. Examples include Uber drivers coordinating their behavior in response to its surge pricing algorithm [Möhlmann and Zalmanson, 2017], applicants selecting keywords and formatting to pass automated resume screening [Forbes, 2022], and Facebook users adjusting their posting and content interaction choices in response to the platforms' curation algorithms [Eslami et al., 2016]. These can be viewed as strategic responses by rational human subjects in these systems, motivating a game-theoretical analysis of learning algorithms with human in the loop. Earlier works on the study of strategic humans facing ML systems largely focused on scenarios where users can strategically alter only their observable data (e.g., students cheating to obtain better test scores, job applicants making formatting or wording changes to their CV, or loan applicants opening several new accounts to increase their credit scores) to receive a favorable decision (e.g., be accepted to a school, job opening, or loan); see, e.g., [Hu et al., 2019, Milli et al., 2019]. This strategic behavior is referred to as strategic manipulation, where agents change their features without changing their true qualification states. This can be interpreted as cheating the machine learning algorithm: such agents may appear to be more qualified, without being truly suitable for a favorable outcome.
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
- Asia > Middle East > Israel > Southern District > Eilat (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Banking & Finance > Credit (0.34)
- Education > Assessment & Standards > Student Performance (0.34)